AITopics | leader reward

Collaborating Authors

leader reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rack Position Optimization in Large-Scale Heterogeneous Data Centers

Chen, Chang-Lin, Chen, Jiayu, Lan, Tian, Zhao, Zhaoxia, Dong, Hongbo, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceMar-31-2025

As rapidly growing AI computational demands accelerate the need for new hardware installation and maintenance, this work explores optimal data center resource management by balancing operational efficiency with fault tolerance through strategic rack positioning considering diverse resources and locations. Traditional mixed-integer programming (MIP) approaches often struggle with scalability, while heuristic methods may result in significant sub-optimality. To address these issues, this paper presents a novel two-tier optimization framework using a high-level deep reinforcement learning (DRL) model to guide a low-level gradient-based heuristic for local search. The high-level DRL agent employs Leader Reward for optimal rack type ordering, and the low-level heuristic efficiently maps racks to positions, minimizing movement counts and ensuring fault-tolerant resource distribution. This approach allows scalability to over 100,000 positions and 100 rack types. Our method outperformed the gradient-based heuristic by 7\% on average and the MIP solver by over 30\% in objective value. It achieved a 100\% success rate versus MIP's 97.5\% (within a 20-minute limit), completing in just 2 minutes compared to MIP's 1630 minutes (i.e., almost 4 orders of magnitude improvement). Unlike the MIP solver, which showed performance variability under time constraints and high penalties, our algorithm consistently delivered stable, efficient results - an essential feature for large-scale data center management.

artificial intelligence, machine learning, rack type, (15 more...)

arXiv.org Artificial Intelligence

2504.00277

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Leader Reward for POMO-Based Neural Combinatorial Optimization

Wang, Chaoyang, Cheng, Pengzhi, Li, Jingze, Sun, Weiwei

arXiv.org Artificial IntelligenceMay-22-2024

Deep neural networks based on reinforcement learning (RL) for solving combinatorial optimization (CO) problems are developing rapidly and have shown a tendency to approach or even outperform traditional solvers. However, existing methods overlook an important distinction: CO problems differ from other traditional problems in that they focus solely on the optimal solution provided by the model within a specific length of time, rather than considering the overall quality of all solutions generated by the model. In this paper, we propose Leader Reward and apply it during two different training phases of the Policy Optimization with Multiple Optima (POMO) [Kwon et al., 2020] model to enhance the model's ability to generate optimal solutions. This approach is applicable to a variety of CO problems, such as the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP), and the Flexible Flow Shop Problem (FFSP), but also works well with other POMO-based models or inference phase's strategies. We demonstrate that Leader Reward greatly improves the quality of the optimal solutions generated by the model. Specifically, we reduce the POMO's gap to the optimum by more than 100 times on TSP100 with almost no additional computational overhead.

co problem, leader reward, pomo, (17 more...)

arXiv.org Artificial Intelligence

2405.13947

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Transportation > Freight & Logistics Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Toward Addressing Human Behavior with Observational Uncertainty in Security Games

Pita, James (University of Southern California) | Yang, Rong (University of Southern California) | Tambe, Milind (University of Southern California) | John, Richard (University of Southern California)

AAAI ConferencesAug-8-2011

Stackelberg games have recently gained significant attention for resource allocation decisions in security settings. One critical assumption of traditional Stackelberg models is that all players are perfectly rational and that the followers perfectly observe the leader’s strategy. However, in real-world security settings, security agencies must deal with human adversaries who may not always follow the utility maximizing rational strategy. Accounting for these likely deviations is important since they may adversely affect the leader’s (security agency’s) utility. In fact, a number of behavioral gametheoretic models have begun to emerge for these domains. Two such models in particular are COBRA (Combined Observability and Bounded Rationality Assumption) and BRQR (Best Response to Quantal Response), which have both been shown to outperform game-theoretic optimal models against human adversaries within a security setting based on Los Angeles International Airport (LAX). Under perfect observation conditions, BRQR has been shown to be the leading contender for addressing human adversaries. In this work we explore these models under limited observation conditions. Due to human anchoring biases, BRQR’s performance may suffer under limited observation conditions. An anchoring bias is when, given no information about the occurrence of a discrete set of events, humans will tend to assign an equal weight to the occurrence of each event (a uniform distribution). This study makes three main contributions: (i) we incorporate an anchoring bias into BRQR to improve performance under limited observation; (ii) we explore finding appropriate parameter settings for BRQR under limited observation; (iii) we compare BRQR’s performance versus COBRA under limited observation conditions.

adversary, artificial intelligence, machine learning, (20 more...)

AAAI Conferences

Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence

Country: North America > United States > California > Los Angeles County > Los Angeles (0.48)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Infrastructure & Services > Airport (0.54)
Leisure & Entertainment > Games > Computer Games (0.51)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback